We are going to perform a quality control (QC) analysis of the mapping results obtained by running cellranger version 7.0.0.
We will pull together all the libraries from all the CSF subprojects.
## [1] "CSF_01"
## [1] "4608"
## [1] "4839"
## [1] "5700"
## [1] "5792"
## [1] "5929"
## [1] "CSF_02"
## [1] "7921"
## [1] "7974"
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to exonic reads, and median genes per cell) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| GEX QC metrics | |||||||
| cellranger v 7.0.0 | |||||||
| Subproject | GemID | Cells | Median UMI counts per cell | Median genes per cell | Median reads per cell | Total genes detected | Number of reads |
|---|---|---|---|---|---|---|---|
| CSF_01 | 4608 | 3984 | 1623 | 858 | 19472 | 22942 | 266.97M |
| CSF_01 | 4839 | 3504 | 2964 | 1303 | 41738 | 22837 | 306.87M |
| CSF_01 | 5700 | 8989 | 2310 | 1049 | 16548 | 23935 | 298.90M |
| CSF_01 | 5792 | 7602 | 3659 | 1517 | 22742 | 25079 | 303.81M |
| CSF_01 | 5929 | 6962 | 3095 | 1319 | 16182 | 24309 | 261.68M |
| CSF_02 | 7921 | 4813 | 3998 | 1615 | 30334 | 23910 | 315.94M |
| CSF_02 | 7974 | 10196 | 3454 | 1565 | 12189 | 25758 | 231.41M |
Next, we will check the quality of the mapping step performed by
cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to the genome, and within these
mapped reads, the amount of reads mapped to intergenic regions, intronic
and exonic regions. We aim to obtain libraries with a high percentage of
confidently mapped reads, and specially a high percentage of exonic
reads, which correspond with gene expression or RNAs. The reads mapping
to intergenic regions suggest contamination of ambient DNA, whereas
reads mapping to intronic regions may come from pre-mRNAs or mature
spliced isoforms that retain certain introns.
## [1] "Confidently_mapped_to_genome"
## [1] "Confidently_mapped_to_intergenic_regions"
## [1] "Confidently_mapped_to_intronic_regions"
## [1] "Confidently_mapped_to_exonic_regions"
## [1] "Confidently_mapped_antisense"
## [1] "Confidently_mapped_to_transcriptome"
After assessing mapped reads, it is important to test which is the sequencing saturation and depth for each library. The sequencing saturation is dependent on the library complexity and sequencing depth. The library complexity is the total number of different transcripts present in the library and it varies between the cell types/tissues, whereas the sequencing depth is the number of paired reads per cell. For this reason, we will plot the number of detected genes as a function of depth (sequenced reads). As sequencing depth increases, more genes are detected, but this function reaches a plateau, whereby more sequenced reads does not result in more detected genes; therefore, at this point we assure we sequenced until saturation. More specifically, the sequencing saturation the fraction of confidently mapped, valid cell-barcode, valid UMI reads that had a non-unique (cell-barcode, UMI, gene).
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| VDJ-T QC metrics | |||||||
| cellranger v 7.0.0 | |||||||
| Subproject | GemID | Number of reads | Estimated number of cells | Fraction reads in cells | Mean reads per cell | Reads mapped to any V D J gene | Cells with productive V-J spanning pair |
|---|---|---|---|---|---|---|---|
| CSF_01 | 4608 | 13071011 | 815 | 34.80 | 0.02M | 38.45 | 58.65 |
| CSF_01 | 4839 | 15751069 | 1912 | 77.62 | 0.01M | 72.19 | 77.62 |
| CSF_01 | 5700 | 14127176 | 5719 | 62.66 | 0.00M | 51.65 | 64.84 |
| CSF_01 | 5792 | 18090339 | 2729 | 43.33 | 0.01M | 31.17 | 69.11 |
| CSF_01 | 5929 | 14573775 | 4297 | 52.51 | 0.00M | 82.88 | 80.78 |
Next, we will check the quality of the V(D)J mapping step performed
by cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to any germline V(D)J gene
segment, and within these mapped reads, the amount of reads mapped TRA
and TRB germline gene segment.
Here, we will assess the median number of UMIs assigned to a TRA/TRB contig per cell. Low values for any of the two parameters can indicate cells with extremely low TRA/TRB expression or poor cell quality, among others.
| VDJ-T expression | ||
| cellranger v 7.0.0 | ||
| GemID | Median_TRA_UMIs_per_Cell | Median_TRB_UMIs_per_Cell |
|---|---|---|
| 4608 | 2 | 4 |
| 4839 | 3 | 7 |
| 5700 | 3 | 8 |
| 5792 | 3 | 7 |
| 5929 | 5 | 10 |
Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:
Cells With TRA/TRB Contig: one TRA/TRB contig annotated as a full or partial V(D)J gene.
Cells With CDR3-annotated TRA/TRB Contig: one TRA/TRB contig where a CDR3 was detected.
Cells With Productive TRA/TRB Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for TRA/TRB, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.
Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.
For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:
| V(D)J annotation | ||||||
| cellranger v 7.0.0 | ||||||
| GEM ID | Estimated Number of Recovered Cells | Productive V-J Spanning Pair | Paired Clonotype Diversity | Productive contig | ||
|---|---|---|---|---|---|---|
| Fraction | Cells | TRA | TRB | |||
| 4608 | 815 | 58.65 | 478 | 109.05 | 65.28 | 93.37 |
| 4839 | 1912 | 77.62 | 1484 | 457.31 | 80.65 | 96.97 |
| 5700 | 5719 | 64.84 | 3708 | 3103.42 | 68.86 | 95.98 |
| 5792 | 2729 | 69.11 | 1886 | 998.72 | 72.63 | 96.48 |
| 5929 | 4297 | 80.78 | 3471 | 648.62 | 83.17 | 97.60 |
We will start by showing the three most relevant metrics (number of reads, estimated number of recovered cells, fraction of reads in cells, mean reads per cell, fraction of reads mapped to any V(D)J gene, and cells with productive V-J Spanning Pair) obtained by cellranger for each of the working libraries. This information will give us an idea of the quality of the experiment as well as the sequencing and the mapping steps.
| BCR-V(D)J QC metrics | ||||||
| cellranger v 7.0.0 | ||||||
| GEM ID | Number of Reads | Estimated Number of Recovered Cells | Fraction of Reads in Cells | Mean Reads per Cell | Fraction of Reads Mapped to any VDJ gene | Cells With Productive V-J Spanning Pair |
|---|---|---|---|---|---|---|
| 4839 | 13.55M | 13 | 48.3% | 1042457 | 47.6% | 13 |
| 5929 | 13.87M | 414 | 92.0% | 33499 | 90.5% | 387 |
Next, we will check the quality of the V(D)J mapping step performed
by cellranger 7.0.0 across libraries. To do so, we will
compare the percentage of reads mapped to any germline V(D)J gene
segment, and within these mapped reads, the amount of reads mapped IGH,
IGK and IGL germline gene segment.
Here, we will assess the median number of UMIs assigned to a IGH/IGK/IGL contig per cell. Low values for any of the three parameters can indicate cells with extremely low IGH/IGK/IGL expression or poor cell quality, among others.
| V(D)J expression | |||
| cellranger v 7.0.0 | |||
| GEM ID | Median IGH UMIs per Cell | Median IGK UMIs per Cell | Median IGL UMIs per Cell |
|---|---|---|---|
| 4839 | 1995.0 | 7406 | 85 |
| 5929 | 2185.5 | 4343 | 5830 |
Now, we will check the V(D)J annotation for the studied samples. To better interpret the obtained results, we will consider the information given in the cellranger web summary file. We will assess the fraction of cell-associated barcodes (with at least…), that are the following ones:
Cells With IGH/IGK/IGL Contig: one IGH/IGK/IGL contig annotated as a full or partial V(D)J gene.
Cells With CDR3-annotated IGH/IGK/IGL Contig: one IGH/IGK/IGL contig where a CDR3 was detected.
Cells With Productive IGH/IGK/IGL Contig: one contig that spans the 5’ end of the V region to the 3’ end of the J region for IGH/IGK/IGL, has a start codon in the expected part of the V sequence, has an in-frame CDR3, and has no stop codons in the aligned V-J region.
Cells With Productive V-J Spanning Pair: one productive contig for each chain of the receptor pair. As well as the correspondent the number of cells with productive V-J Spanning Pair.
For all thre previous parameters, low values can indicate poor cell quality, low yield from the RT reaction, poor specificity of the V(D)J enrichment. Moreover, we will also check:
| V(D)J annotation | |||||||||||||
| cellranger v 7.0.0 | |||||||||||||
| Subproject | GEM ID | Productive contig | Productive V-J Spanning Pair | Cells_with_productive_V-J_spanning_pair | Estimated Number of Recovered Cells | Paired Clonotype Diversity | Reads_mapped_to_IGH | Reads_mapped_to_IGK | Reads_mapped_to_IGL | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IGH | IGK | IGL | Fraction IGK IGH Pair | Fraction IGL IGH Pair | Cells | ||||||||
| CSF_01 | 4839 | 100.00 | 61.54 | 38.46 | 61.54 | 38.46 | 13 | 100.00 | 13 | 13.00 | 21.08 | 24.58 | 1.83 |
| CSF_01 | 5929 | 93.48 | 67.39 | 33.09 | 63.04 | 30.92 | 387 | 93.48 | 414 | 50.95 | 33.41 | 38.64 | 18.40 |
## [1] "Libraries metadata"
## project subproject gem_id library_id library_name library_barcode hashing
## 1 CSF CSF_01 4608 276966 4608_GEX AZ8142 not_hashed
## 2 CSF CSF_01 4839 276967 4839_GEX AZ8143 not_hashed
## 3 CSF CSF_01 5929 276968 5929_GEX AZ8144 not_hashed
## 4 CSF CSF_01 5700 276969 5700_GEX AZ8145 not_hashed
## 5 CSF CSF_01 5792 276970 5792_GEX AZ8146 not_hashed
## 6 CSF CSF_01 4608 277405 4608_TCR AZ8390 not_hashed
## 7 CSF CSF_01 4839 277406 4839_TCR AZ8391 not_hashed
## 8 CSF CSF_01 5929 277407 5929_TCR AZ8392 not_hashed
## 9 CSF CSF_01 5700 277408 5700_TCR AZ8393 not_hashed
## 10 CSF CSF_01 5792 277409 5792_TCR AZ8394 not_hashed
## 11 CSF CSF_01 4839 277410 4839_BCR AZ8396 not_hashed
## 12 CSF CSF_01 5929 277411 5929_BCR AZ8397 not_hashed
## 13 CSF CSF_02 7921 277387 7921_GEX AZ7864 not_hashed
## 14 CSF CSF_02 7974 277388 7974_GEX AZ7865 not_hashed
## type donor_id wet_lab
## 1 cDNA 4608_GEX 4608_GEX
## 2 cDNA 4839_GEX 4839_GEX
## 3 cDNA 5929_GEX 5929_GEX
## 4 cDNA 5700_GEX 5700_GEX
## 5 cDNA 5792_GEX 5792_GEX
## 6 VDJ-T 4608_TCR 4608_TCR
## 7 VDJ-T 4839_TCR 4839_TCR
## 8 VDJ-T 5929_TCR 5929_TCR
## 9 VDJ-T 5700_TCR 5700_TCR
## 10 VDJ-T 5792_TCR 5792_TCR
## 11 VDJ-B 4839_BCR 4839_BCR
## 12 VDJ-B 5929_BCR 5929_BCR
## 13 cDNA 7921_GEX 7921_GEX
## 14 cDNA 7974_GEX 7974_GEX
## [1] "GEX QC summary table"
## # A tibble: 7 x 26
## Subproject GemID Cells Confi~1 Media~2 Media~3 Media~4 Total~5 Numbe~6 Numbe~7
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4608 3984 85.8 1623 858 19472 22942 2.67e8 0
## 2 CSF_01 4839 3504 93.3 2964 1303 41738 22837 3.07e8 0
## 3 CSF_01 5700 8989 93.0 2310 1049 16548 23935 2.99e8 0
## 4 CSF_01 5792 7602 91.8 3659 1517 22742 25079 3.04e8 0
## 5 CSF_01 5929 6962 79.4 3095 1319 16182 24309 2.62e8 0
## 6 CSF_02 7921 4813 93.8 3998 1615 30334 23910 3.16e8 0
## 7 CSF_02 7974 10196 89.6 3454 1565 12189 25758 2.31e8 0
## # ... with 16 more variables: Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## # Q30_barcodes <dbl>, Confidently_mapped_antisense <dbl>,
## # Confidently_mapped_to_exonic_regions <dbl>,
## # Confidently_mapped_to_genome <dbl>,
## # Confidently_mapped_to_intergenic_regions <dbl>,
## # Confidently_mapped_to_intronic_regions <dbl>,
## # Confidently_mapped_to_transcriptome <dbl>, ...
## [1] "VDJ-T QC summary table"
## # A tibble: 5 x 23
## Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Estim~6 Media~7 Media~8 Numbe~9
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4608 65.3 93.4 58.6 58.6 815 2 4 478
## 2 CSF_01 4839 80.6 97.0 77.6 77.6 1912 3 7 1484
## 3 CSF_01 5700 68.9 96.0 64.8 64.8 5719 3 8 3708
## 4 CSF_01 5792 72.6 96.5 69.1 69.1 2729 3 7 1886
## 5 CSF_01 5929 83.2 97.6 80.8 80.8 4297 5 10 3471
## # ... with 13 more variables: Paired_clonotype_diversity <dbl>,
## # Number_of_reads <dbl>, Number_of_short_reads_skipped <dbl>,
## # Q30_RNA_read <dbl>, Q30_UMI <dbl>, Q30_barcodes <dbl>,
## # Fraction_reads_in_cells <dbl>, Mean_reads_per_cell <dbl>,
## # Mean_used_reads_per_cell <dbl>, Reads_mapped_to_TRA <dbl>,
## # Reads_mapped_to_TRB <dbl>, Reads_mapped_to_any_V_D_J_gene <dbl>,
## # Valid_barcodes <dbl>, and abbreviated variable names 1: Subproject, ...
## [1] "VDJ-B QC summary table"
## # A tibble: 2 x 27
## Subpro~1 GemID Cells~2 Cells~3 Cells~4 Cells~5 Cells~6 Cells~7 Estim~8 Media~9
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 CSF_01 4839 100 61.5 38.5 61.5 38.5 100 13 1995
## 2 CSF_01 5929 93.5 67.4 33.1 63.0 30.9 93.5 414 2186.
## # ... with 17 more variables: Median_IGK_UMIs_per_Cell <dbl>,
## # Median_IGL_UMIs_per_Cell <dbl>,
## # `Number_of_cells_with_productive_V-J_spanning_pair` <dbl>,
## # Paired_clonotype_diversity <dbl>, Number_of_reads <dbl>,
## # Number_of_short_reads_skipped <dbl>, Q30_RNA_read <dbl>, Q30_UMI <dbl>,
## # Q30_barcodes <dbl>, Fraction_reads_in_cells <dbl>,
## # Mean_reads_per_cell <dbl>, Mean_used_reads_per_cell <dbl>, ...
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-conda-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /scratch/groups/singlecell/software/anaconda3/envs/csf/lib/libopenblasp-r0.3.21.so
##
## locale:
## [1] C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] RColorBrewer_1.1-3 magrittr_2.0.3 glue_1.6.2 ggrepel_0.9.3
## [5] ggpubr_0.6.0 gt_0.8.0 forcats_0.5.2 stringr_1.5.0
## [9] dplyr_1.0.10 purrr_1.0.1 readr_2.1.3 tidyr_1.3.0
## [13] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2 rmarkdown_2.20
##
## loaded via a namespace (and not attached):
## [1] fs_1.6.0 lubridate_1.9.0 httr_1.4.4
## [4] tools_4.2.2 backports_1.4.1 bslib_0.4.2
## [7] utf8_1.2.2 R6_2.5.1 DBI_1.1.3
## [10] colorspace_2.1-0 withr_2.5.0 tidyselect_1.2.0
## [13] processx_3.8.0 compiler_4.2.2 cli_3.6.0
## [16] rvest_1.0.3 xml2_1.3.3 labeling_0.4.2
## [19] sass_0.4.5 scales_1.2.1 callr_3.7.3
## [22] commonmark_1.8.1 digest_0.6.31 dichromat_2.0-0.1
## [25] pkgconfig_2.0.3 htmltools_0.5.4 dbplyr_2.3.0
## [28] fastmap_1.1.0 highr_0.10 maps_3.4.1
## [31] rlang_1.0.6 readxl_1.4.1 pals_1.7
## [34] jquerylib_0.1.4 generics_0.1.3 farver_2.1.1
## [37] jsonlite_1.8.4 car_3.1-2 googlesheets4_1.0.1
## [40] patchwork_1.1.2 Rcpp_1.0.10 munsell_0.5.0
## [43] fansi_1.0.4 abind_1.4-5 lifecycle_1.0.3
## [46] stringi_1.7.12 yaml_2.3.7 carData_3.0-5
## [49] grid_4.2.2 promises_1.2.0.1 crayon_1.5.2
## [52] cowplot_1.1.1 haven_2.5.1 mapproj_1.2.11
## [55] hms_1.1.2 knitr_1.41 ps_1.7.2
## [58] pillar_1.8.1 ggsignif_0.6.4 reprex_2.0.2
## [61] evaluate_0.20 getPass_0.2-2 modelr_0.1.10
## [64] vctrs_0.5.2 tzdb_0.3.0 httpuv_1.6.8
## [67] cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1
## [70] cachem_1.0.6 xfun_0.36 broom_1.0.2
## [73] rstatix_0.7.2 later_1.3.0 googledrive_2.0.0
## [76] gargle_1.2.1 workflowr_1.7.0 timechange_0.2.0
## [79] ellipsis_0.3.2